Construction of bipartite and unipartite weighted networks from collections of journal 
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This work presents a model that allows the study of research specialties through the manifestations 
of the specialty's social and epistemological processes in a collection of journal papers. Collections of 
papers are modeled as coupled bipartite networks interlinking 7 types of entities. Matrix-based link 
weight functions are introduced to calculate weighted bipartite networks and weighted unipartite 
co-occurrence networks in the collection of papers. These weight calculation methods, when used in 
conjunction with unweighted bipartite growth models, produce simple growth models for weighted 
networks in collections of papers. 



I. INTRODUCTION 

A collection of journal papers is a database of papers 
that comprehensively samples the journal literature of 
a scientific specialty. As such, the social and epistemo- 
logical processes of the specialty are manifested in the 
complex network of linkages among entities within the 
collection of papers. These manifestations are studied by 
bibliometricians and subject matter experts to assess the 
state of research in a specialty, and such studies are used 
to advise managers and policy makers in both govern- 
ment and industry to facilitate research management. 

It is important to develop both complex network mod- 
els and network analysis tools that can be applied to 
collections of papers. Such tools must be used for the 
problem of predicting how the underlying processes of a 
research specialty are manifested in a collection of pa- 
pers, and more importantly, to perform the inverse prob- 
lem of modeling research specialty processes from their 
manifestations in collections of papers. Examples of use- 
ful information about research specialties to be extracted 
from collections of papers include: 1) identifying social 
structures such as research teams, groups of experts, and 
leaders of 'schools of thought', 2) identifying knowledge 
structure, such as research subtopics, base knowledge, 
and exemplars, and 3) identifying temporal trends and 
events such as discoveries, emergence of new specialties 
and research teams, knowledge accretion, and creation 
and obsolescence of concepts and exemplars. 

This paper introduces a structural model of cou- 
pled networks in collections of journal papers and pro- 
poses a construction method for bipartite and unipartite 
weighted networks from such collections. The methods 
presented here constitute an important step in the effort 
to apply the developing science of complex networks the- 
ory to collections of papers and eventually to the study 
of scientific specialties as complex social networks and 
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knowledge networks. 

As complex networks, collections of papers have three 
distinguishing characteristics: 1) they are formed from 
coupled networks of many different types of entities, e.g., 
papers, references, authors, 2) both unipartite and bipar- 
tite networks in collections of papers are best expressed 
as weighted networks, where strength of linkage between 
pairs of entities is expressed as a positive real link weight, 
and 3) collections of papers are best represented as col- 
lections of bipartite networks. 

To date, the phenomenon of coupled networks has re- 
ceived little attention in the physics literature. Zheng 
and Ergun [50] model the simultaneous growth of two 
loosely coupled sections of a unipartite network and 
show conditions for power-law link distributions in the 
crosslinks between network sections. Borner, et al, model 
the simultaneous growth of citation networks and author 
collaboration networks by modeling behavior of authors 
[12]. 

In contrast to the paucity of research on coupled net- 
works, recently a great deal of study has been focused 
on weighted networks. Yook, et al [49], originally inves- 
tigated growing weighted networks using preferential at- 
tachment rules and random attachment rules. Newman 
[38] showed that weighted networks could be expressed 
as multigraphs, and explained how this treatment allows 
generalization of many analysis techniques of unweighted 
networks to weighted networks. Barrat, et al [7], studied 
a large weighted author collaboration network, and the 
weighted world airline network, and showed that these 
networks have differences in correlations of node degrees 
to strength and clustering. Other studies focus on the 
statistical properties of weighted networks [3, 9, 11, 26], 
transport models of weighted networks [5, 21, 22], or 
growth models of weighted networks [4, 8, 10, 16, 20]. 
Fan, et al [19], and Li, et al [30], gathered a collection 
of papers on the specialty of econophysics, and studied 
a weighted unipartite collaboration network of authors 
from that collection. 

On the topic of bipartite networks, recently several pa- 
pers have reported on structural models and growth mod- 
els. Ergun [18] models the human sexual contact network 
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paper journals reference journals 



FIG. 1: Diagram showing a collection of papers as a series of 
coupled bipartite networks. 



as a bipartite graph, with growth having preferential at- 
tachment rules similar to a Yule process. Ramasco, et 
al, present a bipartite Yule model for paper to author 
networks [39]. Guillaueme and Latapy [24] also present 
a bipartite Yule model and propose a method of deriving 
a bipartite expression of any unipartite network. Morris 
[33] proposes the use of general bipartite Yule processes 
for entity-type pairs in collections of journal papers, and 
gives examples for paper to reference networks and pa- 
per to author networks. Morris [32] also gives a detailed 
analysis of a bipartite Yule model for paper to reference 
networks that models heavily cited exemplar references 
in emerging specialties. Goldstein, et al, [23] and Mor- 
ris, et al, [35] propose bipartite Yule models for paper to 
author networks that model the success-breeds-success 
phenomenon for teams of authors. 

As shown in Figure 1, a collection of journal papers 
constitutes a series of coupled bipartite networks. As di- 
agrammed in the figure, a collection of papers contains 
6 direct bipartite networks: 1) papers to paper authors, 
2) papers to references, 3) papers to paper journals, 4) 
papers to terms, 5) references to reference authors, and 
6) references to reference journals. Additionally, there 
are 15 indirect bipartite networks in collections of papers 
as defined by the diagram. Examples of interesting indi- 
rect networks are paper authors to reference authors, and 
paper journals to reference journals networks, which can 
be used for author co-citation analysis [46] and journal 
co-citation analysis [31] respectively. 

This paper introduces a formal matrix-based treatment 
of coupled bipartite structures in collections of papers. 
This treatment is used to calculate the weights of indi- 
rect bipartite networks and is extended to calculation of 
weights of unipartite co-occurrence networks in the col- 
lection. For example, the proposed method can be used 
to calculate the weights of a bipartite paper author to ref- 
erence network, or, it can be used to find weights of the 
unipartite co-occurrrence network of authors that link to 



common papers (a co-authorship network). 

The proposed matrix-based technique is similar to 
multi-port analysis using ABCD parameters in electri- 
cal networks [13]. The method is also very similar to 
methods used in multi-layer neural networks [25]. 

In conjunction with simple bipartite Yule growth mod- 
els [33] , the proposed weight calculation method produces 
simple models of weighted network growth, growing as it 
does from unweighted direct links that occur as papers 
are added to the collection. 



II. COLLECTIONS OF JOURNAL PAPERS 
A. Research specialties 

A research specialty is a self-organized social orga- 
nization whose members tend to study a common re- 
search topic, attend the same conferences, publish in the 
same journals, cite each other's work, and belong to the 
same social networks that are known as invisible colleges 
[15]. Thomas Kuhn, the pioneer of the study of research 
processes, considered specialties to be quite small, "100 
members, sometimes considerably less" [29]. 

The processes that drive research specialties arc 
twofold: 1) social processes of research teams, communi- 
cation networks, and collaboration, and 2) epistemologi- 
cal processes of the discovery, emergence, accretion, and 
obsolescence of knowledge. As described by Kuhn, the 
distinguishing feature of a specialty is its paradigm, which 
is the researchers' "way of thinking" about their problem: 
models, analytical techniques, validation standards and 
so forth. Progress in a specialty is characterized by long 
and stable periods of puzzle-solving within the specialty's 
paradigm, punctuated by discoveries that accompany the 
overthrow and/or creation of new paradigms [29]. This 
characteristic of specialties is similar to punctuated equi- 
libria phenomena [17] that characterize self-organizing 
systems [6]. 

Specialties create their own literature, i.e., a body of 
journal papers and books that broadly focus on the spe- 
cialty's research topic. We define a collection of papers as 
a list of journal papers that constitutes a comprehensive 
sample of a specialty's journal literature. As a working 
definition, define a collection of papers as a database of 
records, one record per paper, that contains information 
about the individual papers in such a list. 

Although the range of size of such collections is large, 
the size of such collections is much smaller than the im- 
mense databases of papers that are often studied in the 
physics literature. Morris [32], using back-of-envelope 
style approximations, suggests that collections of papers 
should range from as few as 100 papers to as many as 
5000 papers. Huge heterogeneous datasets, such as the 
SPIRES database [40], 20 years of PNAS papers [12], 100 
years of Physical Review journals [41], or all the chem- 
istry publications of the Netherlands [45], are not collec- 
tions of papers as defined here, because they all sample 
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more than one specialty's literature. Despite this concep- 
tual constraint, the weight calculation method proposed 
here can still be applied to such huge collections. 



B. Definition of collections of journal papers 

For discussion in this paper, a collection of journal pa- 
pers is a database where each record corresponds to a 
journal paper. For each paper, its associated authors, 
cited references, journal, index terms and publication 
year are listed. Furthermore, for each reference, a ref- 
erence author, reference journal, and reference year are 
listed. As defined here, collections of papers are con- 
structed to comprehensively sample the literature of a 
scientific specialty. For our purposes, collections of pa- 
pers are typically downloaded from the Science Citation 
Index using Thompson/ISPs Web of Science product [51]. 
Queries and seed references are used to gather topic spe- 
cific collections that cover a specialty. The records for 
these papers are typically collected into text files using a 
tagged file format and downloaded for analysis. For the 
purpose of demonstrating the concepts proposed in this 
paper, a fictitious collection of four papers is given in the 
Appendix that covers the fictional specialty of improb- 
ability generation. (Apologies to humor author Douglas 
Adams.) This example collection is provided to allow 
readers to understand the extraction of entities and links 
from the source data of the collection. For illustrative 
purposes the entities in this example are more densely 
linked than would normally be found in such a small col- 
lection of papers. 

A collection of papers can be considered as a network 
of bibliographic entities of various entity-types [36]. Bib- 
liographic entities may correspond to physical entities in 
the real world, and more than one bibliographic entity 
may correspond to the same physical entity. For exam- 
ple, a paper and a reference in a collection of papers may 
both correspond to the same physical paper in the real 
world. 

It is common in studies of networks in journal literature 
to match references to papers to build a model of " papers 
citing papers" , usually referred to as a citation network 
[2]. There are both methodological and theoretical rea- 
sons to avoid this type of treatment: 1) on one hand, 
a collection of papers typically has 20 times more refer- 
ences than papers, making such citation network models 
grossly incomplete because unmatched papers and ref- 
erences (including references corresponding to books), 
have unknown incoming and outgoing links, 2) the second 
problem is that references, especially highly cited refer- 
ences, can be considered as concept symbols [32, 44], and 
therefore should be considered as separate entity-types 
from papers, which merely represent undifferentiated re- 
search reports. Figuratively, it is inappropriate to use 
an " apples-citing-apples" model when the actual network 
is "apples-citing-oranges." Further discussion of citation 
networks is outside the scope of this paper. 



For our proposed structural model of collections of 
journal papers presented in this paper, we will limit our 
discussion to a model comprised of 7 entity- types: 1) pa- 
pers, 2) paper authors, 3) paper journals, 4) index terms, 
5) references, 6) reference authors and 7) reference jour- 
nals. Index terms are terms supplied by authors or ab- 
stract services to associate with papers for search and 
classification purposes. Paper authors are the authors 
of papers, while reference authors are the authors asso- 
ciated with references. Paper journals are the journals 
that papers are published in, while reference journals are 
the journals associated with references. References cor- 
responding to books, films, web pages, and eprint archive 
articles have no associated reference journal. 

Using the 7 entity-types given in our structural model, 
Figure 1 illustrates that a collection of journal papers 
constitutes a series of coupled bipartite networks. As 
noted in Section I, there are 6 direct bipartite net- 
works and 15 indirect bipartite networks in this struc- 
tural model. These indirect bipartite networks are best 
analyzed as weighted networks and those weights can be 
calculated from the paths of direct links that connect 
entities in the two partitions of interest. 

Note the fictitious collection of papers in the Appendix. 
The source file for this collection, which consists of 4 
papers, is listed in ISI tagged file format. See footnote 
[52]. The extracted entities for this collection consists 
of 4 papers, 3 paper authors, 4 paper journals, 7 index 
terms, 10 references, 6 reference authors, and 7 reference 
journals. These entities and their corresponding index 
numbers are listed in the Appendix. 



III. BIPARTITE NETWORKS IN 
COLLECTIONS OF JOURNAL PAPERS 

A. Dyad definitions 

In a dyad, the two entities can be: 1) like entities, that 
is, entities of the same entity-type, or 2) unlike entities, 
that is, entities of different entity-types. Direct links are 
defined as direct associations. A paper has direct links to 
its authors (paper authors), its associated index terms, 
the references the paper cites, and the journal the paper 
was published in. A reference is directly linked to the 
papers that cite it, the author associated with the refer- 
ence (reference author), and the journal that is associated 
with the reference (reference journal). 

Indirect links are links between two unlike entities that 
occur over a path of two or more direct links. For exam- 
ple, a paper author is indirectly linked to a reference 
author if he or she authors a paper that cites a reference 
that is associated with that reference author. 

The first entity of interest in a dyad is the primary en- 
tity while the other entity is the secondary entity. Desig- 
nation of primary entity-type and secondary entity-type 
in direct and indirect bipartite networks is arbitrary and 
is assumed to be based on the interest of the investigator. 
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TABLE I: Variable conventions used for entities in collections 
of papers. 



p: paper 
ap: paper author 
jp: paper journal 
yp: paper year 
t: term 

xc unspecified entity 



r: reference 
ar: reference author 
jr: reference journal 
yr: reference year 



Prefix 'n' to any entity variable to denote the number 
of entities in the collection of that entity-type, e.g., np 
denotes the number of papers in the collection 



For co-occurrence networks, the primary and secondary 
entity-types are explicitly denned, as will be explained in 
Section IV F. Co-occurrence links are between like pri- 
mary entities and occur when both entities link to the 
same secondary entity. For example, two papers have a 
co-occurrence link when they both cite a common ref- 
erence, or, in another example, two paper authors have 
a co-occurrence link if they coauthor a paper. In co- 
occurrence links the like entities of the dyad are primary 
entities, while the unlike entities to which they co-link 
are the secondary entities. 



B. Dyad identifier notation 

Table I lists the conventions used here to denote entity- 
type variables within a collection of papers. The variables 
xi, X2, and so forth will be used to denote unspecified 
entity-types. Dyad notation is used to specify dyad types 
in the collection of papers. The symbols of primary and 
secondary entity-types associated with dyads are sepa- 
rated by a comma and placed between square brackets, 
e.g., [xi,X2], where x\ denotes the primary entity-type, 
and Xi denotes the secondary entity-type. This nota- 
tion will be referred to as the dyad identifier, and will be 
used as a suffix to variables to specify the entity-types 
of interest. However, the dyad identifier will be dropped 
to reduce clutter in the notation when the primary and 
secondary entity- types are obvious from context. Some 
examples of the use of dyad identifiers: 

• 0[p, r] denotes an occurrence matrix listing the 
links of papers, the primary entity-type, to refer- 
ences, the secondary entity-type. 

• C[ap,p] denotes the co-occurrence matrix listing 
the co-authorship counts of pairs of paper authors, 
the primary entity-type, in papers, the secondary 
entity-type. 



C. Bipartite networks 

Bipartite networks are comprised of two distinct par- 
titions of nodes, where all links in the network are from 




Papers 



References 



FIG. 2: A collection of papers and references as a bipartite 
network. References are linked to papers in which they are 
cited. 



entities in the first partition to entities in the second par- 
tition. For our purposes, the first partition exclusively 
holds entities of some entity-type, while the other parti- 
tion exclusively holds entities of some other entity-type. 
As an example, Figure 2 shows a diagram of a bipartite 
network of a partition of papers linked to a partition of 
references. Note that links only occur between papers 
and references and that there are no links between pairs 
of papers or pairs of references. 

Assume the diagrammatic convention as shown in Fig- 
ure 3, that entities of X\, the primary entity- type, are the 
entities in the group on the left and the entities of x 2 , the 
secondary entity-type, are the entities in the group to the 
right. There are nx\ primary entities and nx 2 secondary 
entities. The strength of the link between x\ entity i and 
X2 entity j is the link weight, Oi 3 \x\, x 2 ]- 



D. Occurrence matrices 

Mathematically, the links in a bipartite network are 
described by a rectangular adjacency matrix, which we'll 
define as an occurrence matrix. This is an nx\ by nx 2 
matrix that lists all the link weights between the entities 
of the two partitions: 



0[a;i,a;2] = 



On Ol2 • • • Oi nX2 

021 : 

_ O nXl \ Onxinx^ 



(1) 



Figure 3 shows how the links in a bipartite network cor- 
respond to elements in its occurrence matrix. There is a 
bipartite network for every possible pair of entity-types in 
the collection of papers. Occurrence matrices for entity- 
type pairs with direct relations are derived directly from 
the tables in the collection's database. For the example 




entities entities 

FIG. 3: Diagram of a general bipartite network and conven- 
tions for labeling link weights in the occurrence matrix of the 
network. 

collection of papers discussed in this paper, the occur- 
rence matrices for the 6 direct bipartite networks in the 
collection are given in the Appendix. Occurrence matri- 
ces for entity-type pairs with indirect links are calculated 
by cascading bipartite networks of direct links, as will be 
shown later. 

Note the following property of occurrence matrices: 

0[x 1 ,x 2 }=0[x 2 ,x 1 } T (2) 

Using dyad identifier notation, exchanging the vari- 
ables is equivalent to transposing the occurrence matrix. 



E. Coupled and cascaded bipartite networks 

Coupled bipartite networks are pairs of bipartite net- 
works that share a common partition. Figure 4 shows 
an author to paper network coupled to a paper to refer- 
ence network through common papers using the example 
collection of papers in the Appendix. Cascaded bipartite 
networks are comprised of a series of two or more cou- 
pled bipartite networks. Figure 5 shows an example of 
such a cascade, where a reference author to reference net- 
work is coupled to a reference to paper network that is 
in turn coupled to a paper to paper author network. We 
define the extreme left and right partitions as the outer 
partitions and all other partitions as the inner partitions. 

Assume that we are interested in describing the links 
between two different types of entities as a weighted bi- 
partite network. We first find a cascade of networks 



where the two entity-types of interest are the outer parti- 
tions. Then it is necessary to apply some algorithm that 
meaningfully reduces the indirect links between pairs of 
opposite outer entities as weights in a bipartite network 
joining those outer entities. Intuitively, we want pairs of 
outer entities that have many indirect links through the 
inner partitions to have more weight than those pairs of 
outer entities with few or no connecting links. 

For example, suppose that we wish to find a weighted 
bipartite network between reference authors and paper 
authors for the purpose of conducting author co-citation 
analysis [46] . We can find a cascade of bipartite networks 
as shown in Figure 5, where reference authors are linked 
to their references, the references are linked to the pa- 
pers that cite them, and those papers are linked to the 
paper authors that authored them. The weights of a bi- 
partite network of reference authors to paper authors are 
found by finding the indirect links between each reference 
author and paper author through references and papers, 
and applying an algorithm that produces a weight from 
those identified indirect links. The more indirect links be- 
tween a reference author and a paper author, the more 
weight should be assigned to the link between them in 
the resulting bipartite network. 



IV. ALGORITHM FOR CONSTRUCTION OF 
WEIGHTED BIPARTITE NETWORKS 

A. Reducing a cascade of bipartite networks to a 
single weighted bipartite network 

Given a cascade of bipartite networks with occurrence 
matrices 0[xi,x 2 ], 0[x 2 , £3], • ■ • , 0[x„_i, x n ], this cas- 
cade can be reduced to a single bipartite network with 
occurrence matrix 0[xi,a;„] listing the link weights be- 
tween the x\ entities and the x n entities in the network. 
The proposed weight algorithm is iterative and works by 
sequentially reducing two adjacent networks to a single 
network, then reducing that weighted network and its ad- 
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FIG. 5: An example of a cascade of bipartite networks. A 
reference author to reference network is coupled to a reference 
to paper network that is, in turn, coupled to a paper to paper 
author network. 
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jacent network. This process continues until only a single 
bipartite network remains. 

The algorithm is based on using a generalized form of 
matrix arithmetic. Given a pair of opposite outer en- 
tities, the algorithm finds all unique paths from the left 
outer entity to the right outer entity, and assigns a weight 
to each of those paths. The weights of these parallel paths 
are then combined to calculate the weight of the link be- 
tween the two entities. 



FIG. 6: Diagram of adjacent bipartite networks and conven- 
tions for naming entities and links. 




(a) 



B. Reducing adjacent coupled bipartite networks 
to a single weighted bipartite network 

Consider a pair of coupled bipartite networks, with 
entity-types x\, x 2 , and x 3 , as shown in Figure 6. Oc- 
currence matrices 0[xi,a;2] and O^jXa] enumerate the 
links in the two bipartite networks in this figure. Each 
link in the figure is labeled with its corresponding oc- 
currence matrix element. There are nx\, nx 2 , and nx 3 
entities of the entity-types x\, x 2 , and x 3 respectively. A 
pair of links that connects an x\ entity to an x 3 entity is 
defined as a path. Figure 7, part (a) shows a path from X\ 
entity i to x 3 entity j, connected through x 2 entity k by 
links Oik[x\,x 2 ] and Okj [x 2 , x 3 ]. There are nx 2 possible 
paths from x\ entity i to x 3 entity j as shown in Figure 
7 part (b). 

The path weight associated with a path is calculated 
from the weights of the path's two links using a path 
weight function: 

Pij {k) = h (°ik [xi , x 2 ] , o k j [x 2 ,x 3 }), (3) 

where f 2 is the path weight function, to be defined later. 
The resulting link weight from x\ entity i to x 3 entity j 
is calculated from the path weights of all possible paths 
between those two entities using a path combining func- 




(b) 



FIG. 7: a) Example path between x\ entity i and xz entity j 
through X2 entity k. b) Shows nx2 possible paths between x\ 
entity i and 2:3 entity j through X2 entities. 



tion: 

Oij[xi,x 3 ] = f 1 (p ij (l),p ij (2), . ..pij{nx 2 )^, (4) 

where /1 is the path combining function, to be defined 
later. Substituting Equation (3) into Equation (4) gives 
the link weight function which defines the rules for cal- 
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FIG. 8: Diagram illustrating vector operation of the link 
weight function. 

culating link weights of cascaded bipartite networks: 

Oij[xi,X 3 ] = 

fl {j2{Oil, Olj), fa{o i2 , 02 j), h{Oinx 2 , Onx 2 j)) • (5) 

The link weight function of Equation 5 is a matrix 
function that is used to compute all the nx\ times nx 3 
possible weights of the occurrence matrix 0[x\,x 3 ] ac- 
cording to the rules for weight computation given by fa 
and fa. Consider Figure 8 which illustrates how the link 
weight function uses row i of 0[xi,X2] and column j of 
0[x 2 ,x 3 ] to produce element of matrix 0[xi,X3]. As 
shown, the function fa is applied to matching elements of 
the row vector and column vector to produce nx-i scalar 
results. The function fa operates on all these nx2 results 
to produce the final scalar result Oij[x 1,2:3]. 

The concepts of 1) bipartite networks of entities, 2) 
cascaded bipartite networks, and 3) link weight functions, 
provide a systematic means of finding multiple indirect 
links between outer entities in cascades of bipartite net- 
works, and combining those multiple links as a weight 
in a bipartite network between the outer entities. The 
choice of path weight function and path combining func- 
tion is generally driven by the application. In the case of 
cascades of unweighted bipartite networks, matrix mul- 
tiplication makes a good link weight function because it 
yields weights that are equal to occurrence counts. For 
example, for a paper to reference network coupled to a 
reference to reference author network, matrix multipli- 
cation as a link weight function will produce weights, 
Oij [p, ar] , that are the the number of times paper i cites 
reference author j. 

In other situations, however, other link weight func- 
tions are more appropriate. For example, when reducing 
cascades of weighted bipartite networks, it is necessary to 
consider how to compute path weights from the two links 
in a path. Suppose we have a weighted bipartite network 
of linguistic terms to papers in a collection of papers. 
The weights, Oij[t,p], in this network are the number of 



times term i appears in the body of paper j. Now assume 
this matrix is coupled to a paper to reference author net- 
work, and that there is a path from term i to reference 
author j that corresponds to 10 occurrences of term i 
in paper k, which cites reference author j 2 times. If 
we use multiplication as the path weight function, then 
this yields 10 x 2 = 20 for the path weight. This has 
no meaning as an occurrence count between term i and 
reference author j. In this case we may want to simply 
use a link weight equal to the number of times reference 
author j is cited by paper k, or use a link weight equal to 
the minimum of the number of times paper k cites refer- 
ence author j and the number of times term i occurs in 
paper k. We can also express the two links in the path 
as electrical conductances and calculate the path weight 
as the resulting conductance of those two conductances 
in series. 

The next three subsections will describe three link 
weight functions: 1) matrix multiplication, appropriate 
for cascades of unweighted networks, 2) the overlap func- 
tion, appropriate for cascades of weighted occurrence net- 
works, and 3) the inverse Minkowski function, used to 
compute paths weights as similar to conductances in se- 
ries. 



C. Link weight function using matrix 
multiplication 

For applications where at least one of the matrix ar- 
guments is binary, matrix multiplication is often used as 
the link weight function because it directly yields weights 
that are simple occurrence and co-occurrence counts in 
the resulting reduced bipartite matrix. 

If the path weight function fa is defined as a product: 

fa(o i k[x 1 ,X2],O kj [x2,X 3 ] S J =O i k[x 1 ,X 2 ]-Okj[x2,X3] (6) 

and the path combining function fa is a summation: 
fa(fa(o il [x 1 ,x 2 ], 0^x2, x 3 ]), . . . , 

fa (Oi nx 2 [Xl , X 2 ] , O nX2 j [x 2 ,X 3 ]fj 
nx 2 

= ^2 fa(oik[xi, X 2 ], O k; j[x2, X 3 }) . (7) 
k=l 

Then the link weight function is simply standard matrix 
multiplication: 

nx 2 

Oij [Xl ,X 3 ] = y^ j Ojk [Xl , X 2 ] ■ Okj [X2 , x 3 ] . (8) 

fe=l 

As an example, assume that x\, X2, and x 3 are paper 
authors, papers and references respectively, taken from 
the example collection of papers in the Appendix. The 
binary matrix O [ap, p] , the transpose of O [p, ap] , Equa- 
tion (A. 2), lists the links of the individual paper authors 
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to each paper, while the binary matrix 0[p, r], Equation 
(A.l), lists the links of individual papers with each ref- 
erence. Using matrix multiplication: 



reference to reference author matrix: 

0[ap, ar] = 0[ap, r] ■ 0[r, ar] — 



This yields: 



Q[ap,r] 



Q[ap,r] = Q[ap,p] ■ Q[p,r]. 



110 
10 
111 



2 1 2 1 1 1 
1 1 1 1 1 

3 1 2 2 2 1 2 1 1 1 



(9) 



1 1 1 

1 1 1 1 1 

10 10 10 110 

110 10 10 11 



(10) 



2 1 2 1 1 1 
1 1 1 1 1 

3 1 2 2 2 1 2 1 1 1 



1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 3 2 1 
112 10 

3 3 4 4 1 1 



• (12) 



The result in Equation (12) gives the desired occur- 
rence matrix of paper authors to reference authors for 
the example. In this matrix, the weight Oij[ap,ar] is the 
number of times that paper author i cites reference au- 
thor j. 



This is a matrix, 0[ap, r], in which weight, Oij[ap,r], is 
the number of times that paper author i cites reference 
j- 

Suppose we wish to find the paper author to reference 
author occurrence matrix of the example collection of 
papers in the Appendix. Consulting Figure 1, the direct 
links from paper authors to reference authors go from 
paper author to paper to reference to reference author. 
Calculation of the occurrence matrix, O [ap, ar] , from pa- 
per author to reference author is performed by the matrix 
multiplication: 



0[ap, ar] = 0[ap, p] ■ 0[p, r] ■ 0[r, ar] 
/ t t t f \ 



desired primary 

entity-type desired 
secondary entity-type ot 

,. , . . , , secondary 

preceeding matrix matched . ' 

to primary entity-type of following entlt y tyP e 

matrix 



(11) 



Using the example paper collection in the Appendix, 
first find the paper author to reference matrix by multi- 
plying the paper author to paper matrix and the paper to 
reference matrix. This was done in Equation (10). Then 
multiply the paper author to reference matrix with the 



D. Link weight function using the overlap function 

The overlap function is useful for calculating weights of 
links when reducing cascades of weighted bipartite net- 
works. This is appropriate for calculating bipartite net- 
works involving linguistic terms, and is also useful for 
calculating weights in co-occurrence networks of refer- 
ence authors and reference journals. 

Think of the two links in a path as conduits, each with 
a maximum capacity. The maximum capacity of these 
two conduits in series is equal to that of the conduit with 
the smallest capacity. Considering this series capacity as 
the path weight, the path weight function becomes the 
minimum of the weights of the two links on the path: 

f 2 =min(o i k[x 1 ,X2],Okj[x2,x 3 \y (13) 

Using a path combining function that sums the path 
weights: 

h = ^2.f2(o lk [x 1 ,x 2 },o k:j [x2,x 3 ]y (14) 

k=l 

yields the overlap function [42] as the link weight func- 
tion: 

nx 2 

fi =^rnin(oik[x 1 ,x 2 \,o kj [x2,x 3 \y (15) 
fe=i 

This can be defined as a matrix operation " OVL" : 

0[x u x 3 ] = 0Vl(o[x 1 ,X2],O[x2,x 3 }). (16) 
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terms 



papers 



reference 
authors 



FIG. 9: Example of cascaded bipartite networks with non- 
binary link weights. Terms to paper network cascaded with 
paper to reference author network. 



Discussion of the application and characteristics of this 
function can be found in [27]. 

As an example, assume that x\, X2, and X3 are linguis- 
tic terms, papers and reference authors respectively, as 
shown in Figure 9. The matrix 0[i,p] lists the occurrence 
counts of the individual terms with each paper: 



Q[t,p] 



3 5 
2 6 
1 9 



(17) 



and the matrix O [p, ar] lists the associations of individual 
papers with each reference author: 



0[p, ar] 



2 3 
4 1 



(18) 



Using the overlap function to calculate the link weights 
oiO[t,ar}: 

0[t,ar] = 0VL(o[t,p],O[p,ar]j 



Q[t,ar] = OVL 



3 5 
2 6 
1 9 



2 3 
4 1 



2 7 1 
2 6 1 
1 5 1 



(19) 



E. Link weight function using the inverse 
Minkowski function 

The inverse Minkowski function, an adaptation of the 
well-known Minkowski distance metric [14], can be used 
when it is desired to model path weights as if the link 
weights were electrical conductances in series. In this 
case use the inverse Minkowski metric as the path weight 
function: 



(oi k [xi,x 2 ]j +{o k j[x2,xz] s j 



(20) 



where p ranges from zero to positive infinity. Note that, 
in contrast to the Minkowski metric as normally ex- 
pressed, the exponents in the inverse Minkowski met- 
ric are negative. This function will always generate a 
path weight that is less than or equal to the smallest link 
weight in the path, modeling a situation where indirect 
links tend to be weaker than direct links. Using a path 
combining function that sums the path weights: 

nx 2 

Oik[xi,X2],Okj[x2,X 3 fj (21) 

fe=l 

yields the final inverse Minkowski link weight function: 

71X2 r _ v \ 

o i j[x 1 ,x 3 ] = Y^ (oik[xi,x 2 f) + (o kj [x 2 ,x 3 f 



k=l 



(22) 

This can be defined as a matrix operation 
"INVMINK": 

0[x u x 3 ] = INVMINk(o[ Xi , X2},0[X2,X3}). (23) 

When this function is used with p = 00, Equation (20) 
produces the minimum of its arguments and so reverts to 
Equation (13), making the inverse Minkowski link weight 
function revert to the overlap link weight function. When 
p = 1, then the path weight function, Equation (20), 
becomes: 



1 



+ 



1 



Oik[xi,X 2 ] o k j[x 2l x 3 ] 



(24) 



This makes the path weight function produce a value that 
is twice the harmonic average of the link weights of the 
path. This is equivalent to calculating the path weight 
by modeling the link weights as electrical conductances 
in series. 

The inverse Minkowski path weight function always 
produces a path weight that is less than the smallest 
weight on the path. This is appropriate in situations 
where indirect paths should have less weight than direct 
paths, and mathematically expresses a sensed diffusion, 
or weakening, of the strength of linkage when linkage is 
indirect. 



F. Weights in unipartite co-occurrence networks 

Co-occurrence networks are weighted unipartite net- 
works of like entities where the links between pairs of 
entities is the count of the number of common secondary 
entities that the two primary entities both link to. For ex- 
ample, in a bibliographic coupling network, the nodes are 
papers, and the link weights are the number of common 
references cited by each pair of papers. A co-occurrence 
matrix is the adjacency matrix of a co-occurrence net- 
work. For binary occurrence matrices the co-occurrence 
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matrix can be found by post multiplying the occurrence 
matrix by its transpose. Using Equation (2): 



C[a:i,a:2] = 0[xi,x 2 ] • Q[x 2 ,xi], 



(25) 



where C[a;i,a;2] is the co-occurrence matrix listing the 
number of common associations of pairs of x\ entities 
with x 2 entities. For example, to calculate the co- 
occurrence of papers by their links to references using 
the paper to reference matrix from the example collec- 
tion in the Appendix, use Equation (A.l): 

C[p,r] = 0[p,r]-0[r,p] = 



1111 
10 1 
1110 
10 1 
110 
10 
11 
10 
1 
1 



1 1 1 

1 1 1 1 1 

10 10 10 110 

110 10 10 11 



3 2 2 2 

2 5 3 2 

2 3 5 2 

2 2 2 6 



(26) 



The diagonal of the co-occurrence matrix cn\x\,x 2 \ 
lists the number of links that each x\ has with entities 
of the x 2 entity-type. For example, in the bibliographic 
coupling matrix, C[p, r], calculated in Equation (26), the 
diagonal lists the number of references each papers cites. 

Computation of co-occurrences can be viewed, similar 
to the discussion of Section III E, as the calculation of link 
weights in a cascade of two bipartite networks. Given a 
bipartite network of two unlike entity-types, mirror the 
network across the secondary entity-type partition to ob- 
tain a cascade of two networks. For example, the paper to 
reference network shown in Figure 2 has been mirrored on 
the references to produce the paper-reference-paper cas- 
cade of two bipartite networks shown in Figure 10 (a). 
Calculating the weights of this cascade using matrix mul- 
tiplication will produce the co-occurrence counts of pa- 
pers' links to references, bibliographic coupling strength 
[28], as was done in Equation (26). 

The same network of Figure 2 can be mirrored on the 
papers to produce the reference-paper-reference cascade 
of bipartite networks shown in Figure 10(b). Calculat- 
ing the link weights in this network using matrix mul- 
tiplication yields the co-occurrence counts of references 
links to papers, co-citation strength [43]. Note that each 
occurrence matrix has two co-occurrence matrices asso- 
ciated with it. Figure 11 illustrates this for a sample 
paper to reference occurrence matrix, 0[p, r]. To the 





(a) Bibliographic coupling 



(b) Co-citation 



FIG. 10: Mirror of paper to reference bipartite network to 
calculate weights in a unipartite co-occurrence network as a 
cascade of two bipartite networks, (a) Mirror across references 
to calculate bibliographic coupling, (b) Mirror across papers 
to calculate co-citation. 



right of O [p, r] is the square symmetric bibliographic cou- 
pling matrix C [p, r] , whose size is number of papers in 
O [p, r] . Similarly, below O [p, r] is the square symmetric 
co-citation matrix, C[r, p] whose size is the number of 
references in O [p, r] . 



Linguistic terms to paper networks, reference author to 
paper networks and reference journal to paper networks 
are weighted networks. Because of this, it is not desirable 
to calculate their co-occurrence matrices using matrix 
multiplication because the resulting link weights cannot 
be interpreted. Noting that calculation of co-occurrence 
matrices is analogous to computing link weights for a pair 
of cascaded bipartite networks, as was demonstrated in 
Figure 10 and the discussion above, other link weight 
functions can be used to find their co-occurrence matri- 
ces. This can be done, for example, using the overlap 
function of Section IV D. 



As an example, assume the paper to linguistic term 
matrix: 



Q\p,t] 



8 9 5 3 1 

5 4 9 2 1 

2 6 5 4 

1 1 5 2 5 



(27) 



Using the overlap function, the co-occurrence matrix of 
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FIG. 11: Diagram showing that each occurrence matrix is associated with a pair of co-occurrence matrices. Upper left matrix 
is paper to reference occurrence matrix 0[p, r], below is reference co-occurrence matrix relative to papers (co-citation matrix), 
C[r,p]. Upper right matrix is paper co-occurrence matrix relative to references (bibliographic coupling matrix), C[p, r]. 



papers linked to terms is: 
C[p,f] = 0VL(0\p,t],0[t,P\) 



= OVL 
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(28) 



V. RECURSIVE MATRIX GROWTH 

The recursive growth equations presented in this sec- 
tion are a natural outgrowth of the proposed matrix- 
based mathematical treatment of collections of journal 



papers. They are useful for the purpose of providing in- 
sight into the character of occurrence distributions in the 
collections, as will be explained. 

The basic record in a collection of journal papers is 
the paper. The collection grows paper by paper in the 
temporal order of the publication dates of the papers. 
When a new paper is added, it is associated with the 
existing entities in the collection and additionally, new 
entities, e.g., new paper authors or new references, and 
new terms that enter into the collection. 

This section will present a recursive model of the 
growth of both occurrence and co-occurrence matrices as 
papers are added to the collection. The recursive model 
of matrix growth is found by examination of matrix parti- 
tions in occurrence and co-occurrence matrices as papers 
are added to the collection. 

It is easiest to consider the growth of an example oc- 
currence matrix. For convenience, the paper-reference 
matrix will be studied. The results can be easily ex- 
tended to other occurrence matrices, for example the pa- 
per to paper author matrix [35]. In the matrix the rows 
correspond to papers and are ordered in the sequence 
of publication of the papers to which they correspond. 
The columns correspond to references and are ordered 
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old paper-reference matrix: O, 



all zeros: 



papers 



new paper 



< references — 

1 1 1 1 1 0000000000000 0;0 00 0* 
010101111000000000 0|0 000 
0100100001 1 1000000 oio 000 
010100100100111100 0;0 000 
010100010011001011 1;0 

o i~oT b~6~6 o 01 o o 61 o o 6 Y oi i Y Y T 

V y 

cites to old references: 5: 



cites to new 
references: 1 



FIG. 12: Diagram of the structure of a paper to reference 
matrix. 



in the sequence in which their corresponding references 
first appear. As shown in Figure 12, the matrix contains 
a descending stair step sequence of ones from its upper 
left corner diagonally to its lower right corner. This se- 
quence of ones corresponds to the initial appearance of 
references as papers are added to the collection. Below 
this diagonal sequence of ones is a roughly lower triangu- 
lar region sparsely populated with ones that correspond 
to citations to existing references as each paper is added. 
Above the diagonal sequence of ones is a roughly upper 
triangular area of zeros. 

Considering the collection of journal papers dynami- 
cally, the collection grows from an initial paper by se- 
quential addition of papers in the order in which they 
were published. In this sense the paper-reference matrix 
n grows dynamically one paper at a time. Assume i 
to be the number of papers, while nri is the number of 
references that have appeared in all papers up to and in- 
cluding paper i. Assume i~2i, whose size is i by nr^, as the 
paper-reference matrix after the addition of paper i, then 
consider the addition of paper i + 1. A new row vector, 
i + 1, is added to tti . This vector is partitioned into a 1 
by i vector Si listing the paper's citations to existing ref- 
erences, and 1, a 1 by nr^+i — nri vector of ones occurring 
in new columns added for the new references that have 
appeared in paper i + Figure 12 shows a pictorial rep- 
resentation of this addition. In the new columns, 0, an i 
by nri + i —nr.i zero matrix appears. The recursive matrix 
equation for growth of the paper- reference equation is: 



n t o 
s, 1 



(29) 



Figure 13 shows a map of a typical paper-reference ma- 
trix, where each dot shows the location of a one in the 
matrix. 

As papers are added to the collection, note that in- 
dividual papers collect no links after their initial ap- 
pearance, while references cumulate links (citations from 
newly appearing papers) as papers are added. Entity- 
types that cumulate links in collections of papers usually 
have a power-law frequency distribution relative to pa- 
pers. Three such power-law distributions are well-known: 
1) papers per paper author distribution (Lotka's law) 




1000 2000 3000 4000 5000 6000 

References in order of appearance 20o311MT080744 fig 

FIG. 13: Example of a typical paper to reference matrix. 



[47], 2) papers per paper journal distribution (Bradford's 
law) [47] , and papers per reference distribution (reference 
power law) [37]. Papers, which don't cumulate links, tend 
to have exponential tailed distributions relative to other 
entity- types. Two examples are authors per paper distri- 
bution (1-shifted Poisson) [35], and references per paper 
distribution (lognormal) [32]. 

The bibliographic coupling matrix, which will be des- 
ignated (3, is a symmetric matrix that lists the biblio- 
graphic coupling counts of all pairs of papers within the 
data collection. The diagonal of (3 contains the counts of 
the number of references cited in each paper. The biblio- 
graphic coupling matrix can be obtained by multiplying 
the paper-reference matrix by its transpose: 



(3 = n ■ fi 3 



(30) 



The recursive growth equations for the bibliographic cou- 
pling matrix can be derived by substituting (29) into 
(30): 



a • nf a ■ 

Si-nf ^ -si 



-ii T 



s, ■ a? 



a • si 



H+l 



(31) 



where m^+i is the number of references cited by paper 
i + 1. Figure 14 shows a pictorial representation of a typ- 
ical bibliographic coupling matrix with the partitions in 
Equation (31) identified. It is easy to see from Equation 
(31) and Figure 14 that bibliographic coupling counts 
between pairs of papers are static, and do not change as 
more papers are added to the collection. 

The co-citation matrix, designated as T, is a symmet- 
ric nr by nr matrix that lists the co-citation counts of 
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FIG. 14: Diagram of a bibliographic coupling matrix. 

all pairs of references within the data collection. The 
diagonal of T contains the counts of the number of pa- 
pers that cite each reference. The co-citation matrix can 
be obtained by multiplying the transpose of the paper- 
reference matrix by itself: 

r = n T n. (32) 

The recursive growth equations for the co-citation ma- 
trix can be derived by substituting Equation (29) into 
Equation (32): 

"nf -Sli + ST -Si Sj- 1 
1 T ■ Sj 1 T ■ 1 

= >i+ T V i5 V 1 1. (33) 
1 T • Si 1 T • 1 

Figure 15 shows a pictorial representation of a typical 
co-citation matrix with the partitions in Equation (33) 
identified. It is easy to see that the co-citation count 
between two references is not static, but can be increased 
with the addition of each new paper to the collection. 

VI. EXAMPLE 

An illustrative example of the techniques outlined here 
uses a collection of 902 papers on the topic of complex 
network theory. This collection was gathered in 2003 by 
finding all papers that cite key references in the specialty. 
A detailed analysis of the paper to reference network for 
this collection was presented by Morris [32], while anal- 
ysis of the paper author to paper network for this collec- 
tion was presented by Goldstein, et al, [23] and Morris, 
et al, [35]. 

Figure 16 shows a weighted occurrence matrix, 
0[ap,ar], for the paper author to reference author net- 
work from this collection. In this diagram, the paper 




FIG. 15: Diagram of a co-citation matrix. 

authors are rows, reference authors are columns, and the 
size of the circle at position in the diagram is pro- 
portional to the link weight from paper author i to ref- 
erence author j. In this case the link weight is equal to 
the number of times that paper author i cited reference 
author j. 

In order to visualize the structure of links in the net- 
work, the rows and columns of the matrix have been 
arranged using a seriation algorithm [34] and clustering 
dendrograms have been added on the left and top of the 
figure [36]. The figure is meant to show collaboration 
groups of paper authors and their links to reference au- 
thors as symbols of 'schools of thought' [48]. The visual- 
ization technique of Figure 16 is explained in Morris and 
Yen [36]. 

Only paper authors that authored 6 or more pa- 
pers were visualized. For clustering paper authors, the 
co-occurrence matrix of co-authorship counts, C[ap,p], 
was calculated using matrix multiplication: C[ap,p] = 
0[ap,p] • 0[p, ap\. These co-authorship counts were con- 
verted to distances and a hierarchical clustering routine 
was applied to produce the dendrogram on the left of the 
figure. Groups of paper authors clustered this way can 
be regarded as 'research teams.' 

Only reference authors that were cited 50 or more 
times were visualized. For clustering reference authors, 
the co-occurrence matrix of co-citation counts, C[ar,p], 
was calculated using the overlap function: C[ar,p] = 
OVL(0[ar,p],0[p,ar]). These co-citation counts were 
converted to distances and a hierarchical clustering rou- 
tine was applied to produce the dendrogram at the top 
of the figure. Groups of reference authors clustered this 
way can be regarded as representing 'schools of thought.' 

The paper author to reference author matrix, 
O [ap, ar] , was calculated using matrix multiplication 
0[ap, ar] = 0[ap,p] ■ 0[p, r] ■ 0[r, ar]. The matrix clearly 
shows that dominant reference authors in the specialty, 
who are cited by authors to represent key ideas in the spe- 
cialty, are heavily linked across all paper authors. Note 
that there is evidence of correlation of groups of paper 
authors to groups of reference authors. For example, pa- 
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FIG. 16: Visualization of the occurrence matrix of a weighted paper author to reference author network from a collection of 
papers from the specialty of complex networks theory. 



per authors Choi, Hong, Kim and Holme are all heavily 
connected to reference authors Newman and Watts, while 
paper authors Pastor-Satorras, Vespignani, Vazquez, and 
Moreno are all heavily connected to reference authors 
Pastor-Satorras and Albert. 



This example illustrates the usefulness of the matrix- 
based mathematical treatment of cascades of bipartite 
networks in collection of journal papers. In the exam- 
ple, we have shown this treatment can be used for con- 
struction of weighted unipartite co-occurrence networks 
for clustering purposes: 1) paper authors linked by co- 
authorship, and 2) reference authors linked by common 
papers. Additionally, the method was used to calculate a 
weighted bipartite network of paper authors to reference 
authors. 



VII. CONCLUSION 

We have introduced several valuable methods that can 
be used to apply complex networks theory to collections 
of journal papers: 

• The structural model of coupled bipartite 
networks for collections of papers. This is a 
novel model that allows analysis of any bipartite 
network in the collection in a general, standard- 
ized, manner. Further, it allows building a multi- 
ple entity-type growth model of this system of net- 
works, a technique not generally studied by com- 
plex networks researchers. 

• The matrix-based method of calculating 
weighted bipartite networks. Using the general 
concept of link weight functions, we have shown 
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that this matrix-based technique can be applied 
to cascades of unweighted bipartite networks using 
matrix multiplicaiton. Additionally, the technique 
can be applied to cascades of weighted bipartite 
networks using the overlap function or the inverse 
Minkowski function. 

• The calculation of weighted unipartite 
co-occurrence networks. Considering co- 
occurrence networks as coupled bipartite networks 
made by mirroring around a bipartite partition, 
calculation of weighted co-occurrence networks 
uses the same matrix-based calculation method as 
weighted bipartite networks. 

• The construction of simple models of 
weighted matrix growth. This structural model 
of coupled bipartite networks, when considered 
with unweighted bipartite growth models, such as 
the bipartite Yule model, yields a simple model of 
growth of weighted bipartite networks and weighted 
unipartite co-occurrence networks. Morris [33] 
has shown that simple bipartite Yule processes ef- 
fectively simulate the statistics of bipartite and 
weighted unipartite networks in collections of pa- 
pers. 

The structural model and matrix-based techniques in- 
troduced here provide a unified framework of all entities 
in networks of papers, e.g., paper to author networks 
that are manifestations of social collaboration processes, 
or paper to reference networks that are manifestations 
of epistemological processes such as knowledge accretion 
and exemplar knowledge in a specialty. Such networks 
are often studied as decoupled processes despite their al- 
most certain interdependence. For example, note that 
the paper author to reference author network example 
of Figure 16 shows correlations between groups of pa- 
per authors and groups of reference authors. A realis- 
tic model of processes in a research specialty should be 
able to predict that such correlations will occur, but the 
model must also predict the characteristics of the paper 
author to paper network (such as Lotka's law), and si- 
multaneously predict the characteristics of the paper to 
reference network (such as the reference power law.) All 
of these bipartite networks are interdependent and those 
interdependencies cannot be modeled using simple uni- 
partite or bipartite growth models. The structural model 
introduced here is a step toward modeling the complex 
interdependencies in a research specialty. 

Furthermore, and importantly, these techniques can 
be applied to other report-based structures that can be 
expressed as collections of entities. For example, a col- 
lection of intelligence reports about terrorist events can, 
after application of an entity extraction program, be ex- 
pressed as a collection of entities: reports, place names, 
terrorist group leader names, terrorist group names, gov- 
ernment officials' names, and incident types. These en- 
tities are linked in a coupled bipartite structure, similar 



to Figure 1 and analysis of those linkages could produce 
useful information about networks of terrorists. So the 
structural model introduced here may allow the study of 
other self-organizing social organizations as well, through 
their manifestations in collections of reports. 
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APPENDIX: EXAMPLE COLLECTION OF 
JOURNAL PAPERS 

1. ISI tags[l] 

The table below explains the tags used in the ISI source 
file given in this appendix. 

PT Publication type 
AU Author 
TI Title 

SO Source journal 
ID Index terms 
CR Cited reference 
PY Published year 
VL Volume 
BP Beginning page 
ER End of record 



2. Source file 

Below, in ISI tagged file format, are listed four records 
comprising a fictitious collection of papers on the fictional 
specialty of improbability generation: 

FN ISI Export Format 
PT J 

AU Beeblebrox, Z 

TI Review of finite improbability generators 
SO Bambleweeny Review 

ID FINITE IMPROBABILITY; LIFE; UNIVERSE 

CR FORD P, 1996, J LIFE UNIV EVERY, V46, Pill 

MOUSE B, 1997, REV FUT PHYS, V27, P76 

MOUSE B, 1998, BISTROMATH, V991, P342 
PY 2003 
VL 13 
BP 844 
ER 

PT J 

AU Beeblebrox, Z 

Dent, A 

Prefect, F 
TI Dentrassi hot tea: a scale free 
brownian motion generator 
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SO Journal of Life, the Universe and Everything 

ID FINITE IMPROBABILITY; ULTIMATE QUESTION; 
EVERYTHING, SCALE FREE NETWORKS 

CR FORD P, 1996, J LIFE UNIV EVERY, V46, Pill 
MOUSE B, 1998, BISTROMATH, V991, P342 
TRILLIAN A, 2000, SIRIAN CYBERN J, V82, P675 
TRILLIAN A, 2002, BISTROMATH, V995, P937 
BEEBLEBROX Z, 1994, REV FUT PHYS, V24, P923 

PY 2003 

VL 56 

BP 738 

ER 



jP2- Journal of Life, the Universe and Everything 

jpz: Bistromathematica 

jp 4 : Proc of Vogonian Academy of Science 
Index terms 

tx: FINITE IMPROBABILITY 

t 2 : LIFE 

t 3 : UNIVERSE 

t 4 : ULTIMATE QUESTION 

t 5 : EVERYTHING 

t&: SCALE FREE NETWORKS 

t 7 : INFINITE IMPROBABILITY 



PT J 

AU Prefect, F 

TI Application of infinite improbability to 

spacecraft propulsion 
SO Bistromathematica 

ID INFINITE IMPROBABILITY; LIFE; UNIVERSE; EVERYTHING 

CR FORD P, 1996, J LIFE UNIV EVERY, V46, Pill 
MOUSE B, 1998, BISTROMATH, V991, P342 
TRILLIAN A, 2002, BISTROMATH, V995, P937 
BEEBLEBROX Z, 2003, BAMBLEWEENY REV, V13, P844 
BEEBLEBROX Z, 1989, PRINCIPLES OF IMPROBAPHYSICS 

PY 2004 

VL 997 

BP 938 

ER 

PT J 

AU Prefect, F 

TI Power laws in infinite improbability networks 

SO Proc of Vogonian Academy of Science 

ID INFINITE IMPROBABILITY; ULTIMATE QUESTION; 
EVERYTHING; SCALE FREE NETWORKS 

CR FORD P, 1996, J LIFE UNIV EVERY, V46, Pill 
MOUSE B, 1997, REV FUT PHYS, V27, P76 
TRILLIAN A, 2000, SIRIAN CYBERN J, V82, P675 
BEEBLEBROX Z, 2003, BAMBLEWEENY REV, V13, P844 
SLARTIBARTFAST B, 2001, GALACT J PHYS, V887, P2846 
ZARNIWOOP N, 1978, MEGADODO MAG, V564, P23 

PY 2004 

VL 83 

BP 944 

ER 

EF 



3. Extracted entities 



References 

ri: FORD P, 1996, J LIFE UNIV EVERY, V46, Pill 

r 2 : MOUSE B, 1997, REV FUT PHYS, V27, P76 

r 3 : MOUSE B, 1998, BISTROMATH, V991, P342 

r 4 : TRILLIAN A, 2000, SIRIAN CYBERN J, V82, P675 

r 5 : TRILLIAN A, 2002, BISTROMATH, V995, P937 

r 6 : BEEBLEBROX Z, 1994, REV FUT PHYS, V24, P923 

r 7 : BEEBLEBROX Z, 2003, BAMBLEWEENY REV, V13, P844 

r 8 : BEEBLEBROX Z, 1989, PRINCIPLES OF IMPROBAPHYSICS 

rg: SLARTIBARTFAST B, 2001, GALACT J PHYS, V887, P2846 

ri : ZARNIWOOP N, 1978, MEGADODO MAG, V564, P23 

Reference authors 

an : FORD P 

ar 2 : MOUSE B 

ar 3 : TRILLIAN A 

ar 4 : BEEBLEBROX Z 

ar 5 : SLARTIBARTFAST B 

ar 6 : ZARNIWOOP N 

Reference journals 

jri: J LIFE UNIV EVERY 

jr 2 : REV FUT PHYS 

jr 3 : BISTROMATH 

jr 4 : SIRIAN CYBERN J 

jr 5 : BAMBLEWEENY REV 

jr 6 : GALACT J PHYS 

jr 7 : MEGADODO MAG 



4. Occurrence matrices 

Below are the occurrence matrices for the the direct 
bipartite networks in the collection of papers above. 
Paper to reference network: 



The table below lists the entities extracted from the 
collection of papers above. 

Papers (identified by title) 



Pi 

P2 
P3 
P4 



Review of finite... 
Dentrassi hot tea: a scale free. 
Application of infinite... 
Power laws in infinite... 



Q]p,r] = 



Paper to paper author network: 



1 


1 


1 























1 





1 


1 


1 


1 














1 





1 





1 





1 


1 








1 


1 





1 








1 





1 


1 



Paper authors 
api : Beeblebrox, Z. 
ap2 : Dent , A . 
aps : Prefect, F 

Paper journals 

jpi : Bambleweeny Review 



Q\p,ap] 



Paper to paper journal network: 



1 








1 


1 


1 








1 








1 



(A.l) 



(A.2) 
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0\p,jp] 



Paper to terms network: 



10 
10 
10 
1 



Reference to reference journal network: 



(A.3) 



Q\p,t] = 



0[r, ar] 



1 1 


1 














1 





1 


1 


1 





1 


1 


1 








1 








1 


1 


1 


1 


author network: 


" 1 




















1 

















1 




















1 

















1 




















1 

















1 

















1 




















1 





. 














1 



(A.4) 



0[r, jr] = 



(A.5) 



1 























1 























1 























1 

















1 

















1 





























1 












































1 























1 



(A.6) 



Note that in some cases the paper to terms matrix 
may weighted when working with abstract or title terms 
rather than index terms. 
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Chicago,, 1972. illus. 23 cm. 
[16] S. N. Dorogovtscv and J. F. F. Mendes. Minimal models 

of weighted scale-free networks, cond-mat/0408343, 4 

Sep 2004, 2004. 
[17] N. Eldredge and S. J. Gould. Punctuated equilibria: an 

alternative to phyletic gradualism. In T. J. M. Schopf, 

editor, Models in Paleobiology, pages 82-115. Freeman, 

Cooper, San Francisco, 1972. 



18 



G. Ergun. Human sexual contact network as a bipartite 
graph. Physica A, 308(483-488), 2002. 
Y. Fan, M. Li, J. Chen, L. Gao, Z. Di, and J. Wu. Net- 
work of econophysicists: a weighted network to investi- [36 
gate the development of econophysics. cond-mat/0401054 
vl, 2004. 

C. J. Fu, Q. Ou, W. Chen, B. H. Wang, Y. D. Jin, Y. W. 
Niu, and T. Zhao. A weighted evolving network model [37 
more approach to reality, cond-mat/ 0408155 vl, 2004. 
K. I. Goh, B. Kahng, and D. Kim. Traffic-driven evolu- 
tion of weighted scale-free networks, cond-mat/04 10078 [38 
v2, 2005. 

K. I. Goh, J. D. Noh, B. Kahng, and D. Kim. Op- [39 
timal transport in weighted complex networks, cond- 
mat/0410317 vl, 2004. 

M. L. Goldstein, S. A. Morris, and G. G. Yen. A group- [40 
based model for bipartite author-paper networks. Phys- 
ical Review E (cond-mat/ 0409205), 2004. 

J. Guillaume and M. Latapy. Bipartite graphs as mod- [41 
els of complex networks. Information Processing Letters, 
90(5):215-221, 2004. [42 
Martin T. Hagan, Howard B. Demuth, and Mark H. 
Beale. Neural network design. PWS Pub., Boston, 1996. 
W. Jezewski. Scale-free properties of weighted networks [43 
with connectivity-driven topology, cond-mat/0412196 vl, 
2004. 

W. P. Jones and G. W. Furnas. Pictures of relevance: a 
geometrical analysis of similarity measures. Journal of [44 
the American Society for Information Science and Tech- 
nology, 38(6):420-442, 1987. [45 
M. M. Kessler. Bibliographic coupling between scientific 
papers. American Documentation, 14:10-25, 1963. 
Thomas S. Kuhn. The Structure of Scientific Revolutions. 
University of Chicago Press, Chicago, 2d edition, 1970. [46 
M. Li, Y. Fan, J. Chen, L. Gao, Z. Di, and J. Wu. 
Weighted networks of scientific communication: the 
measurement and topological role of weight. cond- 
mat/ '0409272 v3, 2004. [47 
K. W. McCain. Mapping economics through the journal 
literature: an experiment in journal cocitation analysis. 
Journal of the American Society for Information Science, [48 
42(4):290-296, 1991. 

S. A. Morris. Manifestation of emerging specialties in 
journal literature: a growth model of papers, references, [49 
exemplars, bibliographic coupling, co-citation, and clus- 
tering coefficient distribution. Journal of the American 
Society for Information Science and Technology, in print, [50 
2004. 

S. A. Morris. Bipartite Yule processes in collections of [51 
journal papers, cond-mat/0501386, 2005. [52 
S. A. Morris, B. Asnakc, and G. Yen. Optimal dendro- 
gram seriation using simulated annealing. Information 
Visualization, 2(2):95-104, 2003. 

S. A. Morris, M. L. Goldstein, and C. F. Dcyong. Man- 



ifestation of research teams in journal literature: A 
growth model of papers, authors, collaboration, coau- 
thorship, weak ties, and Lotka's law. (submitted), 2004. 
S. A. Morris and G. Yen. Crossmaps: visualization of 
overlapping relationships in collections of journal papers. 
Proceedings of the National Academy of Sciences of the 
United States, 101(suppl. l):5291-5296, 2004. 
S. Naranan. Power law relations in science bibliography- 
a self-consistent interpretation. Journal of Documenta- 
tion, 27(2):83-97, 1971. 

M. E. J. Newman. Analysis of weighted networks. Phys- 
ical Review E, 70(056131), 2004. 

J. J. Ramasco, S. N. Dorogovtsev, and R. Pastor- 
Satorras. Self-organization of collaboration networks. 
Physical Review E, 70(036106), 2004. 
S. Redner. How popular is your paper? an empirical 
study of the citation distribution. European Physical 
Journal B, 4(2):131-134, 1998. 

S. Redner. Citation statistics from more than a century 
of physical review, physics/0407137, 2004. 

G. Salton. Automatic text processing: the transforma- 
tion, analysis, and retrieval of information by computer. 
Addison- Wesley, Reading, 1989. 

H. Small. Cocitation in scientific literature - new mea- 
sure of relationship between 2 documents. Journal of 
the American Society for Information Science, 24(4) :265- 
269, 1973. 

H. G. Small. Cited documents as concept symbols. Social 
Studies of Science, 8:327-40, 1978. 

A. F. J. van Raan. Competition amongst scientists 
for publication status: toward a model of scientific 
publication and citation distributions. Scientometrics, 
51(l):347-357, 2001. 

H. D. White and B. C. Griffith. Author cocitation: a 
literature measure of intellectual structure. Journal of 
the American Society for Information Science, 32(3):163- 
172, 1981. 

H. D. White and K. W. McCain. Bibliometrics. Annual 
Review of Information Science and Technology, 24:119- 
186, 1989. 

H. D. White and K. W. McCain. Visualization of litera- 
tures. Annual Review of Information Science and Tech- 
nology, 32:99-168, 1997. 

S. H. Yook, H. Jeong, A. Barabasi, and Tu. Y. Weighted 
evolving networks. Physical Review Letters, 86(25) :5835- 
5838, 2001. 

D. Zheng and G. Ergun. Coupled growing networks. Ad- 
vances in Complex Systems, 6(4):507-514, 2001. 
http://www.isinet.com 

A set of MATLAB routines that can extract several types 
of bipartite networks from ISI tagged files is available 
from the authors. Please contact one of the authors for 
further information 



